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METHOD FOR EXTRACTING INFORMATION FROM A DATABASE 



Technical field 

The present invention relates to a method for 
extracting information from a database- The database 
5 comprises a number of data tables containing values of a 
number of variables, each data table consisting of at 
least one data record including at least two of said 
values. The information is extracted by evaluation of at 
least one mathematical function/ which operates on one or 

10 more selected calculation variables- Further, the extrac- 
ted information is partitioned on one or more selected 
classification variables. 
Background of the invention 

It is often desired to extract specific information 

15 from a database that is stored on a secondary memory of a 
computer. More specifically, there is need to summarise a 
large amount of data in the database, and present the 
summarised data to a user in a lucid way. For example, a 
user might be interested in extracting total sales per 

20 year and client from a database including transaction 

data for a large company- Thus, the extraction involves 
evaluation of a mathematical function, e.g. a summation 
( xx SUM(x*y) ") , operating on a combination of calculation 
variables (x, y) , e.g. the number of sold items 

25 ("Number") and thd price per item ("Price") . The extrac- 
tion also involves partitioning the information according 
to classification variables, e.g. "Year" and "Client". 
Thus, the classification variables define how the result 
of the mathematical operation should be presented. In 

30 this specific case, the extraction of the total sales per 
year by client would involve evaluation of 
* v SUM(Number*Price) per Year, Client". 
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In one prior-art solution, a computer program is 
designed to process the database and to evaluate all 
conceivable mathematical functions operating on all 
conceivable calculation variables partitioned on all 
5 conceivable classification variables, also called 

dimensions . The result of this operation is a large data 
structure commonly known as a multidimensional cube. This 
multidimensional cube is obtained through a very time- 
consuming operation, which typically is performed over- 
10 night. The cube contains the evaluated results of the 
mathematical functions for every unique combination of 
the occurring values of the classification variables. The 
user can then, in a different computer program operating 
on the multidimensional cube, explore the data of the 
IS database, for example by visualising selected data in 

pivot tables or graphically in 2D and 3D charts. When the 
user defines a mathematical function and one or more 
classification variables, all other classification 
variables are eliminated through a summation over the 
20 results stored in the cube for this mathematical func- 
tion, the summation being made for all other classifi- 
cation variables. Thus, by adding or removing classifi- 
cation variables, the user can move up or down in the 
dimensions of the cube. 
25 This approach has some undesired limitations. If the 

multidimensional cube after evaluation contains average 
quantities, e.g. the average sales partitioned on a 
number of classification variables, the user cannot 
eliminate one or more of these classification variables 
30 since a summation over average quantities does not yield 
a correct total average. In this case, the ™^ dime »- 
sional cube must contain the average quantity split on 
every conceivable combination of classification variables 
as well, adding an extra complexity to the of 
35 building the multidimensional cube. The same problem 
arises for other quantities, e.g. median values. 
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Often it is difficult to predict all relevant mathe- 
matical functions, calculation variables and classifica- 
tion variables before making a first examination of the 
data in the database. Upon identifying trends and 
5 patterns, the user might find a need to add a function or 
a variable to reach underlying details in the data. Then, 
the time-consuming procedure of building a new multi- 
dimensional cube must be initiated . 
Summary of the invention 

10 Accordingly, the object of the present invention is 

to mitigate the above-mentioned drawbacks and to provide 
a method for extracting information from a database, 
which method allows the user to freely select mathemati- 
cal functions and incorporate calculation variables in 

15 these functions as well as to freely select classifica- 
tion variables for presentation of the results. 

This object is achieved by a method having the 
features recited in independent claim 1, Preferred 
embodiments are recited in the dependent ' claims . 

20 According to the present invention there is provided 

a method for generating a final data structure, i.e. a 
multidimensional cube, from data in a database in an 
efficient way, with respect to both process time and 
memory requirement. Since the cube can be generated much 

25 faster than in prior-art solutions, it is possible to 
generate multidimensional cubes ad hoc- The user can 
interactively define and generate a cube without being 
limited to a very small number of mathematical functions 
and variables. The mathematical function is normally 

30 composed of a combination of mathematical expressions. If 
the user needs to modify the mathematical function by 
changing, adding or removing an mathematical expression, 
a new cube can normally be generated in a time short 
enough not to disturb the user in his work. Similarly, if 

35 the user desires to add or remove a variable, the cube 
can be rapidly regenerated- 
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This is achieved by a clever grouping of all rele- 
vant data tables into boundary tables and connecting 
tables, respectively, based on the type of variables 
included in each table. By electing one of these tables 
5 as a starting point and by building an appropriate 

conversion structure, the final data structure can be 
efficiently generated from the starting table by use of 
the conversion structure. 

Preferably, the data records of the database are 
10 first read into the primary memory of a computer so that 
the data can be processed off-line. This will further 
reduce the time for searching the database and generating 
the final data structure. The database may be stored on a 
secondary memory or be a remotely stored database to 
15 which the computer is connected by a modem. It is to be 
understood that the database thus read into the primary 
memory may be a selected part of a larger database or a 
combination of two or more databases. 

In one preferred embodiment, each different value of 
20 each data variable is assigned a binary code and the data 
records are stored in binary-coded form. On account of 
the binary coding, very rapid searches can be conducted 
in the data tables. Moreover, redundant information can 
be removed, resulting in a reduced amount of data. 
25 in another preferred embodiment, all boundary and 

connecting tables that include calculation variables with 
a need for frequency data, i.e. variables for which the 
number of duplicates of each value is necessary for 
correct evaluation of the mathematical function, define a 
30 subset. By electing the starting table from this subset 
and by including frequency data in the conversion struc- 
ture, memory-efficient storage of duplicates can be 
achieved when building the final data structure. 

in the conversion structure, the frequency data can 
35 be included by duplication of each value, i.e. the con- 
version structure will contain a link from each value of 
a connecting variable in the starting table to a correct 
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number of each value of at least one corresponding 
selected variable in a boundary table. Alternatively, a 
counter may be included in the conversion structure for 
each unique value of each connecting variable in the 
starting table. 

Preferably, the boundary or connecting table having 
the largest nuraber of data records is elected as starting 
table. This tends to minimise the amount of frequency 
data that must be incorporated in the conversion struc- 
ture, which therefore can be built more rapidly. 

In a further preferred embodiment, a virtual data 
record is created by reading a data record of the 
starting table and by using the conversion structure to 
convert each value of each connecting variable in this 
15 data record into a value of at least one corresponding 

selected variable. Thereby, the virtual data record will 
contain a current combination of values of the selected 
variables. The final data structure can be gradually 
built by sequentially reading data records from the 
20 starting table, by updating the content of the virtual 
data record based on the content of each such data 
record, and by evaluating the mathematical function based 
on the content of each such updated virtual data record. 
This procedure minimises the amount of computer memory 
25 that is needed for extracting the requested information 

from the database. Further, virtual data records contain- 
ing undefined values, so-called NULL values, of any 
calculation variable can often be immediately removed, in 
particular when all calculation variables exhibit NULL- 
30 values, since in many cases such values can not be used 
in the evaluation of the mathematical function- This 
feature will contribute to an optimised performance. 

In another preferred embodiment, an intermediate 
data structure is built based on the content of the 
35 virtual data record. Each data record of the intermediate 
data structure contains a field for each selected classi- 
fication variable and an aggregation field for each 
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mathematical expression included in the mathematical 
function. For each updated virtual data record, each 
mathematical expression is evaluated and the result is 
aggregated in the appropriate aggregation field based on 
S the current value of each selected classification 

variable. Such an intermediate data structure allows the 
user to combine mathematical expressions with different 
need for frequency data in one mathematical function- By 
building several conversion structures incorporating 

10 corresponding frequency data, and by evaluating the data 
records of a starting table for each such mathematical 
expression based on a corresponding conversion structure, 
it is possible to merge the results of these evaluations 
in one intermediate data structure. Likewise, if the user 

15 modifies the mathematical function by adding a new mathe- 
matical expression operating on the already selected cal- 
culation variables, it is only necessary to add an 
aggregation field to the existing intermediate data 
structure, or to extend an existing aggregation field, • 

20 It should be noted that the virtual data record in 

general is indeed virtual, i.e. it is not physically 
allocated any memory, during the transition from a data 
record of the starting table to the final data structure. 
However, such a virtual data record can always, at least 

25 implicitly, be identified in the procedure of converting 
the content of a data record of the starting table into 
current values of the selected variables. 
Description of preferred embodiments 

The present invention will now be described by way 

30 of examples, reference being made to the tables of 

Appendix A and to Fig, 1 of the drawing, showing the 
content of a database after identification of relevant 
data tables according to the inventive method. 

A database, as shown in Fig. 1, comprises a number 

35 of data tables (Tables 1-5) . Each data table contains 

data values of a number of data variables. For example, 
in Table 1 each data record contains data values of the 
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data variables -Product", "Price" and -Part". If there is 
no specific value in a field of the data record, this 
field is considered to hold a NULL-value . Similarly, in 
Table 2 each data record contains values of the variables 
5 -Date", "Client", -Product" and -Number". Typically, the 
data values are stored in the form of ASCII-coded 
strings. 

The method according to the present invention is 
implemented by means of a computer program. In a first 
10 step, the program reads all data records in the database, 
for instance using a SELECT statement which selects all 
the tables of the database, i.e. Tables 1-5 in this case. 
Typically, the database is read into the primary memory 
of the computer. 
15 To increase the evaluation speed, it is preferred 

that each unique value of each data variable in said 
database is assigned a different binary code and that the 
data records are stored in binary-coded form. This is 
typically done when the program first reads the data 
20 records from the database. For each input table, the 

following steps are carried out. First the column names, 
i.e. the variables, of the table are successively read. 
Every time a new data variable appears, a data structure 
is instantiated for it. Then, an internal table structure 
25 is instantiated to contain all the data records in binary 
form, whereupon the data records are successively read 
and binary-coded. For each data value, the data structure 
of the corresponding data variable is checked to estab- 
lish if the value has previously been assigned a binary 
30 code. If so, that binary code is inserted in the proper 
place in the above-mentioned table structure. If not, the 
data value is added to the data structure and assigned a 
new binary code, preferably the next one in ascending 
order, before being inserted in the table structure. In 
other words, for each data variable, a unique binary code 
is assigned to each unique data value. 
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Tables 6-12 of Appendix A show the binary codes 
assigned to different data values of some data variables 
that are included in the database of Fig. 1. 

After having read all data records in the database , 
5 the program analyses the database to identify all connec- 
tions between the data tables. A connection between two 
data tables means that these data tables have one 
variable in common. Different algorithms for performing 
such an analysis are known in the art. After the analysis 
10 all data tables are virtually connected. In Fig. 1, such 
virtual connections are illustrated by double-ended 
arrows (a) . The virtually connected data tables should 
form at least one so-called snowflake structure, i.e. a 
branching data structure in which there is one and only 
15 one connecting path between any two data tables in the 
database- Thus f a snowflake structure does not contain 
any loops > If loops do occur among the virtually connec- 
ted data tables, e.g. if two tables have more than one 
variable in common, a snowflake structure can in some 
20 cases still be formed by means of special algorithms 
known in the art for resolving such loops. 

After this initial analysis, the user can start to 
explore the database. In doing so, the user defines a 
mathematical function, which could be a combination of 
25 mathematical expressions. Assume that the user wants to 
extract the total sales per year and client from the 
database in Fig. 1. The user defines a corresponding 
mathematical function "SUM(x*y)", and selects the calcu- 
lation variables to be included in this function: "Price" 
30 and "Number". The user also selects the classification 
variables: "Client'' and "Year". 

The computer program then identifies all relevant 
data tables, i.e. all data tables containing any one of 
the selected calculation and classification variables, 
35 such data tables being denoted boundary tables, as well 

as all intermediate data tables in the connecting path(s) 
between these boundary tables in the snowflake structure, 
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such data tables being denoted connecting tables. For the 
sake of clarity, the group of relevant data tables 
(Tables 1-3) is included in a first frame <A) in Fig. 1. 
Evidently, there are no connecting tables in this parti- 
5 cular case. 

In the present case, all occurrences of every value, 
i-e. frequency data, of the selected calculation 
variables must be included for evaluation of the mathe- 
matical function. In Fig. 1, the selected variables 
10 rPrice", "Number") requiring such frequency data are 

indicated by bold arrows (b) , whereas remaining selected 
variables are indicated by dotted lines (b' ) . Now, a 
subset (B) can be defined that includes all boundary 
tables (Tables 1-2) containing such calculation variables 
15 and any connecting tables between such boundary tables in 
the snowflake structure. It should be noted that the 
frequency requirement of a particular variable is deter- 
mined by the mathematical expression in which it is 
included- Determination of an average or a median calls 
for frequency information. In general, the same is true 
for determination of a sum, whereas determination of a 
maximum or a minimum does not require frequency data of 
the calculation variables. It can also be noted that 
classification variables in general do not require 
25 frequency data. 

Then, a starting table is elected, preferably among 
the data tables within subset (B) , most preferably the 
data table with the largest number of data records in 
this subset. In Fig. 1, Table 2 is elected as the start- 
30 ing table. Thus, the starting table contains selected 

variables (^Client", ^Number") , and connecting variables 
r v Date", "Product"). These connecting variables link the 
starting table (Table 2) to the boundary tables (Tables 1 
and 3) . 

35 Thereafter, a conversion structure is built, as 

shown in Tables 13 and 14. This conversion structure is 
used for translating each value of each connecting var- 
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iable ("Date", "Product") in the starting table (Table 2) 
into a value of a corresponding selected variable 
("Year", "Price") in the boundary tables (Table 3 and l, 
respectively) . Table 13 is built by successively reading 
5 data records of Table 3 and creating a link between each 
unique value of the connecting variable ("Date") and a 
corresponding value of the selected variable ("Year") . It 
can be noted that there is no link from value 4 ("Date: 
1999-01-12") , since this value is not included in the 
10 boundary table. Similarly, Table 14 is built by success- 
ively reading data records of Table 1 and creating a link 
between each unique value of the connecting variable 
("Product") and a corresponding value of the selected 
variable ("Price"). In this case, value 2 ("Product: 
15 Toothpaste") is linked to two values of the selected 
variable ("Price: 6.5"), since this connection occurs 
twice in the boundary table. Thus, frequency data is 
included in the conversion structure. Also note that 
there is no link from value 3 ("Product: Shampoo") . 
20 When the conversion structure has been built, a 

virtual data record is created. Such a virtual data 
record, as shown in Table 15, accommodates all selected 
variables ("Client", "Year", "Price", "Number") in the 
database. In building the virtual data record, a data 
25 record is first read from the starting table (Table 2) . 
Then, the value of each selected variable ("Client", 
"Number") in the current data record of the starting 
table is incorporated in the virtual data record. Also, 
by using the conversion structure (Tables 13-14) each 
30 value of each connecting variable ("Date", "Product") in 
the current data record of the starting table is con- 
verted into a value of a corresponding selected variable 
("Year", "Price"), this value also being incorporated in 
the virtual data record. 
35 At this stage, the virtual data record is used to 

build an intermediate data structure (Table 16) . Bach 
data record of the intermediate data structure accommo- 
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dates each selected classification variable (dimension) 
and an aggregation field for each mathematical expression 
implied by the mathematical function. The intermediate 
data structure (Table 16) is built based on the values of 
the selected variables in the virtual data record. Thus, 
each mathematical expression is evaluated based on one or 
more values of one or more relevant calculation variables 
in the virtual data record, and the result is aggregated 
in the appropriate aggregation field based on the combi- 
nation of current values of the classification variables 
( v Client", "Year") . 

The above procedure is repeated for all data records 
of the starting table. Thus, an intermediate data struc- 
ture is built by successively reading data records of the 
15 starting table, by incorporating the current values of 
the selected variables in a virtual data record, and by 
evaluating each mathematical expression based on the 
content of the virtual data record. If the current combi- 
nation of values of classification variables in the 
20 virtual data record is new, a new data record is created 
in the intermediate data structure to hold the result of 
the evaluation. Otherwise, the appropriate data record is 
rapidly found, and the result of the evaluation is aggre- 
gated in the aggregation field. Thus, data records are 
25 added to the intermediate data structure as the starting 
table is traversed. Preferably, the intermediate data 
structure is a data table associated with an efficient 
index system, such as an AVL structure. In most cases, 
the aggregation field is implemented as a summation 
30 register, in which the result of the evaluated mathe- 
matical expression is accumulated. In some cases, e.g. 
when evaluating a median, the aggregation field is 
instead implemented to hold all individual results for a 
unique combination of values of the specified classifica- 
35 tion variables. It should be noted that only one virtual 
data record is needed in the procedure of building the 
intermediate data structure from the starting table. 
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Thus, the content of the virtual data record is updated 
for each data record of the starting table. This will 
minimise the memory requirement in executing the computer 
program. 

5 The procedure of building the intermediate data 

structure will be further described with reference to 
Tables 15-16. In creating the first virtual data record 
Rl, as shown in Table 15, the values of the selected 
variables "Client" and "Number" are directly taken from 
10 the first data record of the starting table (Table 2). 
Then, the value "1999-01-02" of the connecting variable 
"Date" is transferred into the value "1999" of the 
selected variable "Year", by means of the conversion 
structure (Table 13) . Similarly, the value "Toothpaste" 
15 of the connecting variable "Product" is transferred into 
the value "6.5" of the selected variable "Price" by means 
of the conversion structure (Table 14), thereby forming 
the virtual data record Rl. Then, a data record is. 
created in the intermediate data structure, as shown in 
20 Table 16. In this case, the intermediate data structure 

has tree columns, two of which holds selected classifica- 
tion variables ("Client", "Year") . The third column holds 
an aggregation field, in which the evaluated result of 
the mathematical expression ("x*y") operating on the 
25 selected calculation variables ("Number", "Price") is 
aggregated. In evaluating virtual data record Rl, the 
current values (binary codes: 0,0) of the classification 
variables are first read and incorporated in this data 
record of the intermediate data structure. Then, the 
30 current values (binary codes: 2,0) of the calculation 
variables are read. The mathematical expression is 
evaluated for these values and added to the associated 
aggregation field. 

Next, the virtual data record is updated based on 
35 the starting table. Since the conversion structure (Table 
14) indicates a duplicate of the value "6.5" of the 
selected variable "Price" for the value "Toothpaste" of 
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the connecting variable "Product", the updated virtual 
data record R2 is unchanged and identical to Rl. Then, 
the virtual data record R2 is evaluated as described 
above, in this case, the intermediate data structure con- 
tains a data record corresponding to the" current values 
(binary codes: 0,0) of the classification variables. 
Thus, the evaluated result of the mathematical expression 
is accumulated in the associated aggregation field. 

Next, the virtual data record is updated based on 
the second data record of starting table, in evaluating 
this updated virtual data record R3, a new data record is 
created in the intermediate data structure, and so on. 

It should be noted that NULL values are represented 
by a binary code of -2 in this example. In the illustra- 
15 ted example, it should also be noted that any virtual 

data records holding a NULL value (-2) of any one of the 
calculation variables can be directly eliminated, since 
NULL values can not be evaluated in the mathematical 
expression ("x*y") . It should also be noted that all NULL '. 
values (-2) of the classification variables are treated 
as any other valid value and are placed in the inter- 
mediate data structure. 

After traversing the starting table, the inter- 
mediate data structure contains four data records, each 
25 including a unique combination of values (0,0; 1,0; 2,0; 
3,-2) of the classification variables, and the correspon- 
ding accumulated result (41; 37.5; 60; 75) of the evalua- 
ted mathematical expression. 

Preferably, the intermediate data structure is also 
30 processed to eliminate one or more classification 

variables (dimensions) . Preferably, this is done during 
the process of building the intermediate data structure, 
as described above. Every time a virtual data record is 
evaluated, additional data records are created, or found 
35 if they already exist, in the intermediate data struc- 
ture. Each of these additional data records is destined 
to hold an aggregation of the evaluated result of the 
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mathematical expression for all values of one or more 
classification variables. Thus, when the starting table 
has been traversed, the intermediate data structure will 
contain both the aggregated results for all unique 
5 combinations of values of the classification variables, 
and the aggregated results after elimination of each 
relevant classification variable. 

This procedure of eliminating dimensions in the 
intermediate data structure will be further described 
10 with reference to Tables 15 and 16. When virtual data 
record Rl is evaluated (Table 15) and the first data 
record (0,0) is created in the intermediate data struc- 
ture, additional data records are created in this struc- 
ture. Such additional data records are destined to hold 
15 the corresponding results when one or more dimensions are 
eliminated. In Table 16, a classification variable is 
assigned a binary code of -1 in the intermediate data 
structure to denote that all values of this' variable are 
evaluated. In this case, three additional data records 
20 are created, each holding a new combination of values 

(-1,0; 0,-1; -1,-1) of the classification variables. The 
evaluated result is aggregated in the associated aggrega- 
tion field of these additional data records. The first 
(-1,0) of these additional data records is destined to 
25 hold the aggregated result for all values of the classi- 
fication variable "Client" when the classification 
variable "Year" has the value "1999". The second (0,-1) 
additional data record is destined to hold the aggregated 
result for all values of the classification variable 
30 "Year" when the classification variable "Client" is 
"Nisse". The third (-1,-1) additional data record is 
destined to hold the aggregated result for all values of 
both classification variables "Client" and "Year". 

When virtual data record R2 is evaluated, the result 
35 is aggregated in the aggregation field associated with 

the current combination of values (binary codes: 0,0) of 
the classification variables, as well as in the aggre- 
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gation fields associated with relevant additional data 
records (binary codes: -1,0; 0,-1; -1,-1). When virtual 
data record R3 is evaluated, the result is aggregated in 
the aggregation field associated with the current combi- 
nation of values (binary codes: 1,0) of the classifica- 
tion variables. The result is also aggregated in the 
aggregation field of a newly created additional data 
record (binary codes: 1,-1) and in the aggregation fields 
associated with relevant existing data records (binary 
codes: -1,0; -1,-1) in the intermediate data structure. 

After traversing the starting table, the inter- 
mediate data structure contains eleven data records, as 
shown in Table 16. 

Preferably, if the intermediate data structure 
15 accommodates more than two classification variables, the 
intermediate data structure will, for each eliminated 
classification variable, contain the evaluated results 
aggregated over all values of this classification 
variable for each unique combination of values of remain- 
20 ing classification variables. 

When the intermediate data structure has been built, 
a final data structure, i.e. a multidimensional cube, as 
shown in non-binary notation in Table 17, is created by 
evaluating the mathematical function pS0M(x*y)") based 
25 on the results of the mathematical expression ( v x*y") 
contained in the intermediate data structure. In doing 
so, the results in the aggregation fields for each unique 
combination of values of the classification variables are 
combined. In the illustrated case, the creation of the 
final data structure is straightforward, due to the 
trivial nature of the present mathematical function. The 
content of the final data structure might then be presen- 
ted to the user in a two-dimensional table, as shown in 
Table 18. Alternatively, if the final data structure 
35 contains many dimensions, the data can be presented in a 
pivot table, in which the user interactively can move up 
and down in dimensions, as is well known in the art. 



30 
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Below, a second example of the inventive method is 
described with reference to Tables 20-29. The description 
will only elaborate on certain aspects of this example, 
namely building a conversion structure including data 
5 from connecting tables, and building an intermediate data 
structure for a more complicated mathematical function. 
In this example, the user wants to extract sales data per 
client from a database, which contains the data tables 
shown in Tables 20-23. For ease of interpretation, the 
10 binary coding is omitted in this example. 

The user has specified the following mathematical 
functions, for which the result should be partitioned per 
Client s 

a) " IF (Only (Environment index)*' I') THEN 

15 Sum (Number *Price)*2, ELSE Sum (Number* Price )) ", and 

b) w Avg(Nuraber*Price)" 

The mathematical function (a) specifies that the 
sales figures should be doubled for products that, belong 
to a product group having an environment index of , 
20 while the actual sales figures should be used for other 

products. The mathematical function (b) has been included 

for reference. 

In this case, the selected classification variables 
are "Environment index" and "Client", and the selected 

25 calculation variables are "Number" and "Price". Tables 

20, 22 and 23 are identified as boundary tables, whereas 
Table 21 is identified as a connecting table. Table 20 is 
elected as starting table. Thus, the starting table 
contains selected variables ("Number", "Client"), and a 

30 connecting variable ("Product") . The connecting variable 
links the starting table (Table 20) to the boundary 
tables (Tables 22-23), via the connecting table (Table 
21) . 

Next, the formation of the conversion structure will 
35 be described with reference to Tables 24-26. A first part 
(Table 24) of the conversion structure is built by 
successively reading data records of a first boundary 



12 MfiR '99 14:09 flWQMTENT A 

«*. t Patem^^ reg.verket 
1999 -OS- 1 2 

17 

table (Table 23) and creating a link between each unique 
value of the connecting variable ("Product group"") and a 
corresponding value of the selected variable ("Environ- 
ment index"). Similarly, a second part (Table 25) of the 
5 conversion structure is built by successively reading 
data records of a second boundary table (Table 22) and 
creating a link between each unique value of the connec- 
ting variable reprice group") and a corresponding value 
of the selected variable TPrice") . Then, data records of 
10 the connecting table (Table 21) is read successively. 

Each value of the connecting variables (^Product group" 
and w Price group", respectively) in Tables 24 and 25 is 
substituted for a corresponding value of a connecting 
variable ("Product") in Table 21. The result is merged in 
15 one final conversion structure, as shown in Table 26. 

Then, an intermediate data structure is built by 
successively reading data records of the starting table 
(Table 20), by using the conversion structure (Table 26) 
to incorporate the current values of the selected' vari- • 
20 ables ("Environment index", "Client", "Number", "Price") 
in the virtual data record, and by evaluating each mathe- 
matical expression based on the current content of the 
virtual data record. 

For reasons of clarity, Table 27 displays the 
25 corresponding content of the virtual data record for each 
data record of the starting table. As noted in connection 
with the first example, only one virtual data record is 
needed- The content of this virtual data record is 
updated, i.e. replaced, for each data record of the 
30 starting table. 

Each data record of the intermediate data structure, 
as shown in Table 28, accommodates a value of each 
selected classification variable ("Client", "Environment 
index") and an aggregation field for each mathematical 
35 expression implied by the mathematical functions. In this 
case, the intermediate data structure contains two 
aggregation fields. One aggregation field contains the 
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aggregated result of the mathematical expression r x x*y") 
operating on the selected calculation variables 
("Number", "Price") , as well as a counter of the number 
of such operations- The layout of this aggregation field 
5 is given by the fact that an average quantity should be 
calculated <"Avg (x*y) ") . The other aggregation* field is 
designed to hold the lowest and highest values of the 
classification variable "Environment index" for each 
combination of values of the classification variables. 

10 As in the first example, the intermediate data 

structure (Table 28) is built by evaluating the 
mathematical expression for the current content of the 
virtual data record (each row in Table 27), and by 
aggregating the result in the appropriate aggregation 

IS field based on the combination of current values of the 

classification variables ("Client", "Environment index") - 
The intermediate data structure also includes data 
. records in which the value "<ALL>" has been assigned to 
one or both of the classification variables- The corres- 

20 ponding aggregation fields contain the aggregated result 
when the one or more classification variables (dimen- 
sions) are eliminated . 

When the intermediate data structure has been built/ 
a final data structure, i*e« a multidimensional cube, is 

25 created by evaluating the mathematical functions based on 
the evaluated results of the mathematical expressions 
contained in the intermediate data structure. Each data 
■ record of the final data structure, as shown in Table 29, 
accommodates a value of each selected classification 

30 variable ("Client", "Environment index") and an aggre- 
gation field for each mathematical function selected by 
the user. 

The final data structure is built based on the 
results in the aggregation fields of the intermediate 
35 data structure for each unique combination of values of 
the classification variables. When function (a) is 
evaluated, by sequentially reading data records of Table 
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28, the program first checks if both values in the last 
column of Table 28 is equal to . If so, the relevant 
result contained in the first aggregation field of Table 
28 is multiplied by two and stored in Table 29. If not, 
5 the relevant result contained in the first aggregation 
field of Table 28 is directly stored in Table 29. When 
function (b) is evaluated, the aggregated result of the 
mathematical expression C'x+y") operating on the selected 
calculation variables (^Number", M Price") is divided by 
10 the number of such operations, both of which are stored 
in the first aggregation field of Table 28. The result is 
stored in the second aggregation field of Table 29. 

Evidently, the present invention allows the user to 
freely select mathematical functions and incorporate 
15 calculation variables in these functions as well as to 
freely select classification variables for presentation 
of the results. 

As an alternative, albeit less memory-efficient, to 
the illustrated procedure of building an intermediate 
!0 data structure based on sequential data records from the 
starting table, it is conceivable to first build a so- 
called join table. This join table is built by traversing 
all data records of the starting table and, by use of the 
conversion structure, converting each value of each 
5 connecting variable in the starting table into a value of 
at least one corresponding selected variable in a 
boundary table. Thus, the data records of the join table 
will contain all occurring combinations of values of the 
selected variables. Then, the intermediate data structure 
0 is built based on the content of the join table. For each 
record of the join table, each mathematical expression is 
evaluated and the result is aggregated in the appropriate 
aggregation field based on the current value of each 
selected classification variable. However, this alter- 
5 native procedure requires more computer memory to extract 
the requested information. 
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CLAIMS 



10 



15 



20 



25 



30 



1. A method for extracting information from a 
database, which comprises a number of data tables 
containing values of a number of variables, each data 
table consisting of at least one data record including at 
least two of said values, said information being 
extracted by evaluation of at least one mathematical 
function operating on one or more selected calculation 
variables, said extracted information being partitioned 
on one or more selected classification variables, 
characterised by the steps of: 

identifying all data tables containing at least one 
value of one of said selected variables, such data tables 
being boundary tables; 

identifying all data tables that, directly or 
indirectly, have variables in. common with said boundary 
tables and connect the same, such data tables being 
connecting tables; 

electing a starting table among said boundary and 
connecting tables; 

building a conversion structure that links values of 
each selected variable in said boundary tables to 
corresponding values of one or more connecting variables 
in said starting table; and 

evaluating said mathematical function for each data 
record of said starting table, by using said conversion 
structure to convert each value of each connecting 
variable into at least one value of at least one 
corresponding selected variable, such that said 
evaluation yields a final data structure containing a 
result of said mathematical function for every unique 
value of each classification variable. 

2. A method as set forth in claim 1, charac- 
terised by the further step of presenting relevant 
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parts of said resulting data structure to the user in 
human-readable form. 

3. A method as set forth in claim 1 or 2, cha- 
racterised by the further step of initially 
reading said data records of said database into the 
primary memory of a computer. 

4. A method as set forth in any one of the preceding 
claims, characterised by the further step of 
initially assigning a different binary code to each 
unique value of each data variable in said database and 
storing the data records in binary-coded form- 

5. A method as set forth in any one of the preceding 
claims, characterised by the further steps of 
initially identifying all data tables in said database 

15 that have variables in common, and assigning virtual 

connections between such data tables, thereby creating a 
database with a snowflake structure, wherein said 
connecting tables are located between said boundary 
tables in said snowflake structure. 

20 6. A method as set forth in any one of the preceding 

claims, characterised by the further steps of 
identifying all calculation variables for which the 
number of occurrences of each value is necessary for 
correct evaluation of said mathematical function, 

25 defining a subset of data tables consisting of boundary 
tables containing such variables and data tables 
connecting such boundary tables, electing said starting 
table from said subset, and including data on said number 
of occurrences of each value in said conversion 

30 structure. 

7. A method as set forth in any one of the preceding 
claims, characterised in that said starting 
table is the data table among said boundary and 
connecting tables having the largest number of data 

35 records. 

8. A method as set forth in any one of the preceding 
claims, characterised by the further step of 
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building said final data structure, which includes a 
number of data records, each of which contains a field 
for each selected classification variable and an 
aggregation field for said mathematical function, wherein 
5 said building step includes sequentially reading a data 
record of said starting table, creating a current 
combination of values of said selected variables by using 
said conversion structure to convert each value of each 
connecting variable in said data record into a value of 
10 at least one corresponding selected variable, evaluating 
said mathematical function for said current combination 
of values, and aggregating the result of said evaluation 
in the appropriate aggregation field based on the current 
value of each selected classification variable. 
15 9. A method as set forth in any one of claims 1-7, 

characterised by the further step of creating a 
virtual data record containing a combination of values of 
said selected variables, wherein said creating step 
includes reading a data record of said starting table and 
20 using said conversion structure to convert each value of 
each connecting variable in said data record into a value 
of at least one corresponding selected variable. 

10. A method as set forth in claim 9, charac- 
terised by the further step of building said final 
25 data structure which includes a number of data records, 
each of which contains a field for each selected 
classification variable and an aggregation field for said 
mathematical function, wherein said building step 
includes sequentially reading a data record of said 
30 starting table, updating the content of said virtual data 
record based on the content of each such data record, 
evaluating said mathematical function based on said 
updated virtual data record, and aggregating the result 
of said evaluation in the appropriate aggregation field 
35 based on the current value of each selected 

classification variable in said updated virtual data 
record. 
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11. A method as set forth in claim 9, charac- 
terised by the further step of building an 
intermediate data structure which includes a number of 
data records, each of which contains a field for each 
selected classification variable and an aggregation field 
for each mathematical expression implied by said 
mathematical function, wherein said building step 
includes sequentially reading a data record of said 
starting table, updating the content of said virtual data 
record based on the content of each such data record, 
evaluating each mathematical expression based on said 
updated virtual data record, and aggregating the result 
of said evaluation in an appropriate aggregation field 
based on the current value of each selected 
classification variable in said updated virtual data 
record . 

12. A method as set forth in claim 11, charac- 
terised in that said step of building said 
intermediate data structure includes: 

eliminating one of said classification variables in 
said intermediate data structure by aggregating said 
results over all values of said one classification 
variable for each unique combination of values of 
remaining classification variables, by creating 
additional data records, and by incorporating said 
aggregated results in said additional data records of 
said intermediate data structure. 

13. A method as set forth in claim 11 or 12, cha- 
racterised by the further step of evaluating said 
mathematical function based on said results in said 
aggregation fields for each unique combination of values 
of said classification variables, thereby building said 
final data structure. 

14. A method as set forth in any one of the 
preceding claims, characterised in that said 
step of building said conversion structure includes: 



12 MAR '99 14:12 P(WftPP>TE NJ_ * _ S. 30/34 



hkt Patent- dl^PverfcBt 

1999 -03- 1 2 



28 

a) reading data records of a boundary table, and 
creating a conversion structure including a link between 
each unique value of at least one connecting variable in 
said boundary table and each corresponding value of at 

5 least one selected variable therein; 

b) moving from said boundary table towards said 
starting table; 

c) if a connecting table is found, reading data 
records of said connecting table, and substituting each 

10 unique value of said at least one connecting variable in 
said conversion structure for at least one corresponding 
unique value of at least one connecting variable in said 
connecting table; and 

d) repeating steps (b)-(c) until said starting table 
15 is found. 

15. An article of manufacture comprising a computer- 
readable medium having stored thereon a computer program 
for effecting the steps of a method for extracting 
information from a database as set forth in any one of 
20 the preceding claims. 
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ABSTRACT 



A method operates on a database to extract and 
present information to a user. The database comprises 
data tables containing values of a number of variables. 
The information is to be extracted by evaluating at least 
one mathematical function which operates on one or more 
selected calculation variables. The presented information 
is to be partitioned on one or more selected classifica- 
tion variables. 

The method comprises the steps of identifying all 
boundary tables, i.e. data tables containing at least one 
value of one of the selected variables; identifying all 
connecting tables, i.e. data tables that have variables 
in common with the boundary tables and connect the same- 
electing a starting table among said boundary and connec- 
ting tables/ building a conversion structure that links 
values of each selected variable in the boundary tables 
to corresponding values of one or more connecting variab- 
les in the starting table; and evaluating the mathemati- 
cal function for each data record of the starting table, 
by using the conversion structure, such that the evalu- 
ation yields a final data structure containing a result 
of the mathematical function for every unique value of 
each classification variable. 
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